An Extension of a Method of Hardin and Rocke, with an Application to Multivariate Outlier Detection via the IRMCD Method of Cerioli
نویسندگان
چکیده
Hardin and Rocke investigated the distribution of the robust Mahalanobis squared distance (RSD) computed using the minimum covariance determinant (MCD) estimator. They showed that the distribution of RSDs for outlying observations not part of the MCD subset is well-approximated by an F distribution. They developed a methodology to adjust an asymptotic formula for the degrees of freedom parameters of this F distribution to provide correct parameter values in small-to-moderate samples. This methodology was developed for the maximum breakdown point version of the MCD, which is based on approximately half of the observations. Whether the approximation remains accurate for the MCD using larger subsets of the data is an open question. We show that their approximation works quite well for the more general MCD, but can be noticeably inaccurate for sample sizes less than 250 and when the MCD estimate uses nearly all of the observations. Motivated by the desire to apply RSD-based outlier detection tests to financial asset return and factor exposure data sets whose typical sample sizes are smaller than 250, we develop a more general correction procedure that is accurate across a wider range of sample sizes and MCD subset sizes than the Hardin and Rocke approach. We use our approach to extend Cerioli’s IRMCD procedure for accurate RSD-based outlier tests to arbitrary MCD subset sizes.
منابع مشابه
Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator
Mahalanobis-type distances in which the shape matrix is derived from a consistent highbreakdown robust multivariate location and scale estimator can be used to 2nd outlying points. Hardin and Rocke (http://www.cipic.ucdavis.edu/∼dmrocke/preprints.html) developed a new method for identifying outliers in a one-cluster setting using an F distribution. We extend the method to the multiple cluster c...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملDetecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes
With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...
متن کاملApplication of Recursive Least Squares to Efficient Blunder Detection in Linear Models
In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...
متن کاملMultivariate Outlier Detection With High-Breakdown Estimators
Multivariate Outlier Detection With High-Breakdown Estimators Andrea Cerioli Andrea Cerioli is Professor, Dipartimento di Economia, Sezione di Statistica e Informatica, Università di Parma, Via Kennedy 6, 43100 Parma, Italy . The author expresses his gratitude to three anonymous reviewers for insightful comments that led to many improvements in the article. The author also thanks Marco Riani an...
متن کامل